15 research outputs found
Prompt Engineering a Prompt Engineer
Prompt engineering is a challenging yet crucial task for optimizing the
performance of large language models (LLMs). It requires complex reasoning to
examine the model's errors, hypothesize what is missing or misleading in the
current prompt, and communicate the task with clarity. While recent works
indicate that LLMs can be meta-prompted to perform automatic prompt
engineering, their potentials may not be fully untapped due to the lack of
sufficient guidance to elicit complex reasoning capabilities in LLMs in the
meta-prompt. In this work, we investigate the problem of "prompt engineering a
prompt engineer" -- constructing a meta-prompt that more effectively guides
LLMs to perform automatic prompt engineering. We introduce and analyze key
components, such as a step-by-step reasoning template and context
specification, which lead to improved performance. In addition, inspired by
common optimization concepts such as batch size, step size and momentum, we
introduce their verbalized counterparts to the meta-prompt and investigate
their effects. Our final method, named PE2, finds a prompt that outperforms
"let's think step by step" by 6.3% on the MultiArith dataset and 3.1% on the
GSM8K dataset. To demonstrate its versatility, we apply PE2 to the Instruction
Induction benchmark, a suite of counterfactual tasks, and a lengthy, real-world
industrial prompt. In these settings, PE2 achieves strong performance and
outperforms prior automatic prompt engineering baselines. Further, we show that
PE2 makes meaningful and targeted prompt edits, amends erroneous or incomplete
prompts, and presents non-trivial counterfactual reasoning abilities
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks
Recently, there has been a rapid advancement in research on Large Language
Models (LLMs), resulting in significant progress in several Natural Language
Processing (NLP) tasks. Consequently, there has been a surge in LLM evaluation
research to comprehend the models' capabilities and limitations. However, much
of this research has been confined to the English language, leaving LLM
building and evaluation for non-English languages relatively unexplored. There
has been an introduction of several new LLMs, necessitating their evaluation on
non-English languages. This study aims to expand our MEGA benchmarking suite by
including six new datasets to form the MEGAVERSE benchmark. The benchmark
comprises 22 datasets covering 81 languages, including low-resource African
languages. We evaluate several state-of-the-art LLMs like GPT-3.5-Turbo, GPT4,
PaLM2, and Llama2 on the MEGAVERSE datasets. Additionally, we include two
multimodal datasets in the benchmark and assess the performance of the
LLaVa-v1.5 model. Our experiments suggest that GPT4 and PaLM2 outperform the
Llama models on various tasks, notably on low-resource languages, with GPT4
outperforming PaLM2 on more datasets than vice versa. However, issues such as
data contamination must be addressed to obtain an accurate assessment of LLM
performance on non-English languages.Comment: 23 pages, 30 figures and 1 tabl
MEGA: Multilingual Evaluation of Generative AI
Generative AI models have shown impressive performance on many Natural
Language Processing tasks such as language understanding, reasoning, and
language generation. An important question being asked by the AI community
today is about the capabilities and limits of these models, and it is clear
that evaluating generative AI is very challenging. Most studies on generative
LLMs have been restricted to English and it is unclear how capable these models
are at understanding and generating text in other languages. We present the
first comprehensive benchmarking of generative LLMs - MEGA, which evaluates
models on standard NLP benchmarks, covering 16 NLP datasets across 70
typologically diverse languages. We compare the performance of generative LLMs
including Chat-GPT and GPT-4 to State of the Art (SOTA) non-autoregressive
models on these tasks to determine how well generative models perform compared
to the previous generation of LLMs. We present a thorough analysis of the
performance of models across languages and tasks and discuss challenges in
improving the performance of generative LLMs on low-resource languages. We
create a framework for evaluating generative LLMs in the multilingual setting
and provide directions for future progress in the field.Comment: EMNLP 202
Saynis 1
Buuggan ujeeddada ugu weyn ee laga leeyahay waxaa weeye in ardayda lagu abuuro xirfad ay degaankooda ku fiirsadaan, kuna baaraan iyo in la siiyo aqoon sayniseed oo saldhig u noqon karta barashada sayniseed ee mustaqbalka._-_Questo libro ha lo scopo principale di sviluppare nello studente l’abilità di osservare e analizzare l’ambiente circostante e di fornirgli gli strumenti base per l’apprendimento scientifico nel futuro._-_A book aiming to develop observational and analytical skills for environmental studies
Saynis. Buugga 4
Buuggan ugu ujeeddada weyn ee laga leeyahay waxay tahay, in ardayga lagu abuuro xirfado uu degaankooda ku fiirsadaan kuna baataan, iyo in la siiyo aqoon sayniseed oo saldhig u noqon karta barashada sayniseed ee mustaqbalka._-_Lo scopo principale di questo libro è quello di sollecitare l’alunno a osservare ed analizzare l’ambiente in cui vive e fornirgli delle conoscenze scientifiche che possano essere di base per l’apprendimento scientifico futuro._-_ A book aiming to develop observational and analytical skills for environmental studies
Saynis. Buugga 3
Ujeeddada Waxbarashada caafimaadka laga lahaa waaxay ahayd in la beddelo asluubta ardayda si ay u gutaan xilka caafimaadka iyo kaa bulshada ay la nool yihiin._-_L’obiettivo dell’educazione alla salute è quello di modificare il comportamento degli alunni in modo che adempiano alla propria responsabilità di badare alla salute propria e a quella della società che li circonda._-_Health education aims to change people's behaviour for the better, for the sake of individuals and of the sorrounding society
Saynis - Buugga 2
Buuggan oo loogu talagalay ardayda ku jirta fasalka labaad ee dugsiyada baraymariga, wuxuu ahaa mid ku fadhiya habka cusub ee carruurta Sayniska loo baro. Wuxuu ilmaha darrensiinayay in uu saynusku yahay barasho la xiriirta waxyaalaha agagaarkiisa ah e uu maalin walba arko ama maqlo._-_Libro destinato agli alunni della seconda elementare, basato su metodi nuovi per l'insegnamento delle scienze ai bambini. Fa capire ai bambini che l'apprendimento delle scienze è legato al mondo circostante, che ogni giorno si trovavano a vedere e sentire._-_Science book for the second class of primary schools using new methods for teaching science to children, based on the observation of the environment
Saynis. Buugga 3
Ujeeddada Waxbarashada caafimaadka laga lahaa waaxay ahayd in la beddelo asluubta ardayda si ay u gutaan xilka caafimaadka iyo kaa bulshada ay la nool yihiin._-_L’obiettivo dell’educazione alla salute è quello di modificare il comportamento degli alunni in modo che adempiano alla propria responsabilità di badare alla salute propria e a quella della società che li circonda._-_Health education aims to change people's behaviour for the better, for the sake of the individuals and of the sorrounding society
Waxbarashada Caafimaadka. Buugga 3
Buuggan saddexaad oo loogu talaggalay ardayda dugsiyada hoose waxaa ku daabacan sawirro iyo sharraxaad muujinaya habka uu ardaygu u guto xilka caafimaadka iyo kan bulsha uu la noolyahay._-_Manuale per alunni delle scuole primarie; include lezioni di igiene e sui comportamenti essenziali per la salute e la convivenza con la comunità con cui vive l'alunno._-_Manual meant for students of the primary schools; it teaches hygienic rules and the essential behaviour to keep healthy and live with people in the students' community
Xisaab 3
Buug xisaab oo swirro fiican leh oo loogu talaggalay ardayda fasalka saddexaad ee dugsiyada hoose: tirada kumanaal, jajabka, cabbiraadda._-_Manuale di matematica e geometria destinato ad alunni della terza classe della scuola elementare: le migliaia, frazioni, unità di misura._-_Mathematics and geometry textbook meant for third class students of primary schools: thousands, fractions, units of measure